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Abstract. In this paper we study from a numerical analysis perspective the Fractional Step 
Kinetic Monte Carlo (FS-KMC) algorithms proposed in [l] for the parallel simulation of spatially 
distributed particle systems on a lattice. FS-KMC are fractional step algorithms with a time-stepping 
window At, and as such they are inherently partially asynchronous since there is no processor com- 
munication during the period At. In this contribution we primarily focus on the error analysis of 
FS-KMC algorithms as approximations of conventional, serial kinetic Monte Carlo (KMC). A key as- 
pect of our analysis relies on emphasizing a goal-oriented approach for suitably defined macroscopic 
to/} observables (e.g., density, energy, correlations, surface roughness), rather than focusing on strong 

3 ■ topology estimates for individual trajectories. 

One of the key implications of our error analysis is that it allows us to address systematically the 
processor communication of different parallclization strategics for KMC by comparing their (partial) 
asynchrony, which in turn is measured by their respective fractional time step At for a prescribed 
error tolerance. 
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1. Introduction. The simulation of stochastic lattice systems using kinetic Monte 
Carlo (KMC) methods relies on the direct numerical simulation of the underlying 
Continuous Time Markov Chain (CTMC). In [T] we proposed a new mathematical 
and computational framework for constructing parallel algorithms for KMC simula- 
tions. The parallel algorithms in [T] are controlled approximations of Kinetic Monte 
Carlo algorithms, and rely on first developing a spatio-temporal decomposition of the 
Markov operator for the underlying CTMC into a hierarchy of operators correspond- 
ing to the particular parallel architecture. Based on this operator decomposition, we 
formulated Fractional Step Approximation schemes by employing the Trotter product 
formula, which in turn determines the processor communication schedule. The frac- 
tional step framework allows for a hierarchical structure to be easily formulated and 
implemented, offering a key advantage for simulating on modern parallel architectures 
with elaborate memory and processor hierarchies. The resulting parallel algorithms 
are inherently partially asynchronous as processors do not communicate during the 
fractional time step window At. 

Earlier, in |23j the authors also proposed an approximate algorithm, in order 
to create a parallelization scheme for KMC. It was demonstrated in [THl HO], that 
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boundary inconsistencies are resolved in a straightforward fashion, while there is an 
absence of global communications. Finally, among the parallel algorithms tested in 
[T§] , the one in [23] had the highest parallel efficiency In [I] , we demonstrated that the 
approximate algorithm in |23j is a special case of the Fractional Step Approximation 
schemes introduced in pQ. There we also demonstrated, using the Random Trotter 
Theorem [13] , that the algorithm in [23j is numerically consistent in the approximation 
limit, i.e., as the time step in the fractional step scheme converges to zero, it converges 
to a Markov Chain that has the same master equation and generator as the original 
serial KMC. The open source SPPARKS parallel Kinetic Monte Carlo simulator, 
|20j . can also be formulated as a Fractional Step approximation. In this article, the 
convergence, reliability and efficiency of all such Fractional Step KMC parallclization 
methods is systematically explored by rigorous numerical analysis which relies on 
controlled-error approximations in transient regimes relevant to the simulation of 
extended systems. 

A key aspect of the presented analysis relies on emphasizing a goal-oriented error 
approach for suitably defined macroscopic observables, e.g., density, energy, corre- 
lations, surface roughness , giving rise to estimates which are independent of the 
(very large) system size of the particle system. Besides the obtained numerical con- 
sistency and reliability of the approximating CTMC obtained from FS-KMC there 
is an additional key practical point: the bigger is the allowable At, within a desired 
error tolerance, the less processor communication is required. From a broader math- 
ematical perspective, and driven from parallel computing challenges, the developed 
mathematical and numerical analysis attempts to balance between controlled error 
approximations and processor communication. The same methods could also prove 
useful for developing and evaluating parallel numerical schemes for other molecular 
and extended systems. 

2. Background. Kinetic Monte Carlo (KMC) algorithms have proved to be 
an important tool for the simulation of non- equilibrium, spatially distributed chemical 
processes arising in applications ranging from materials science, catalysis and reaction 
engineering, to complex biological processes. Typically the simulated models involve 
chemistry and/or transport micro-mechanisms for atoms and molecules, e.g., reac- 
tions, adsorption, desorption processes and diffusion on surfaces and through porous 
media, [T31 [HI H] - Furthermore, mathematically similar mechanisms and correspond- 
ing KMC simulations arise in agent-based models in epidemiology, ecology and traffic 
networks, [24] . 

We consider an interacting particle system defined on a d-dimensional lattice Ajy. 
Naturally, the simulations are performed on a finite lattice of the size N, however, 
given the size of real molecular systems it is either necessary to treat the case N — > 
oo, e.g., A = Z d , or alternatively any numerical estimates we obtain need to be 
independent of the system size N. We restrict our discussion to lattice gas models 
where the order parameter or the spin variable takes values in a compact set, in most 
cases the set is finite £ = {0, 1, . . . , K}. At each lattice site x £ An an order parameter 
(a spin variable) o~(x) £ E is defined. The states in E correspond to occupation of 
the site x £ A^r by different species. For example, if £ = {0, 1} the order parameter 
models the classical lattice gas with a single species occupying the site x when a(x) = 1 
and with the site being vacant if cr{x) = 0. We denote {(Tt}t>o the stochastic process 
with values in the countable configuration space S = E Ajv . Microscopic dynamics 
is described by transitions (changes) of spin variables at different sites. We study 
systems in which the transitions are localized and involve only finite number of sites at 
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each transition step. First, the local dynamics is described by an updating mechanism 
and corresponding transition rates c{x,ui\a) in (|2.1[) . such that the configuration at 
time t, o~t = cr changes into a new configuration a x ' u by an update in a neighborhood 
of the site x £ Ajv. Here u> G S x , where S x is the set of all possible configurations 
that correspond to an update at a neighborhood Vt x of the site x. For example, if the 
modeled process is a diffusion of the classical lattice gas a particle at x, i.e., o~(x) can 
move to any unoccupied nearest neighbor y of x, i.e., Q x = {y G An \ \x — y\ = 1} and 
S x is the set of all possible configurations S x = E nx . Computationally the sample 
paths {<7 t }t>o are constructed via KMC, that is through the procedure described in 
flUJ) and below. 

The studied stochastic processes are set on a lattice (square, hexagonal, etc.) 
An with N sites, they have a discrete, albeit high-dimensional, configuration space 
S and necessarily have to be of jump type describing transitions between different 
configurations a G S. Mathematically, a CTMC is a stochastic process {at} defined 
completely in terms of the local transition rates c(a, a') which determine the updates 
(jumps) from any current state a t = a to a (random) new state a'. In the context 
of the spatially distributed applications in which we are interested here, the local 
transition rates will be denoted as 

c(o-,o-') = c(x,uj;(t) , (2.1) 

which correspond to an updating micro-mechanism from a current configuration a t = 
a of the system to a new configuration a x,u by performing an update in a neighborhood 
of each site x G A^r- Here w is an index for all possible configurations S x that 
correspond to an update at a neighborhood fi x of the site x; we refer to the end of 
the section for specific examples. 

The probability of a transition over an infinitesimal time interval St is 

P (at+st = cr x ' u | a t = a) = c(x, u; a)St + o(5t) . 

Realizations of the process are constructed from the embedded discrete time Markov 
chain S n = er tn (see [H]), corresponding to jump times t n . The local transition rates 
(|2.1j) define the total rate 

X ^= E E 4*,^), ( 2 - 2 ) 

which is the intensity of the exponential waiting time for a jump from the state a. 
The transition probabilities for the embedded Markov chain {S n } n >o are 

V{^) = C ^P^. (2-3) 

In other words once the exponential "clock" signals a jump, the system transitions 
from the state a to a new configuration a x - u with the probability p{a,a x u ). On 
the other hand, the evolution of the entire system at any time t is described by the 
transition probabilities P(a, t\ £) := P (a t = a \ a = £) where £ G S is an initial con- 
figuration. The transition probabilities corresponding to the local rates (|2.1[) satisfy 
the Forward Kolmogorov Equation (Master Equation), [151 15], 

d t P(a,t;0~ ]T c(a',a)P(a',t;0-H<r)P(c-,t;0, (2-4) 

a' .a'^a 
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where P((J, 0; C) = S(a — C) an d S(a — C) = 1 if cr = £ and zero otherwise. 

In [T] we developed a general mathematical framework for parallelizable approxi- 
mations of the KMC algorithm. Rather than focusing on exactly constructing stochas- 
tic trajectories in (|2.2[) and (|2.3[) . we proposed to approximate the evolution of ob- 
servables / = /(er) £ 0,(5), i.e., of bounded continous functions on the configuration 
space S. The space of bounded continuous functions, Cb(S), is regarded as a Banach 
space with the norm 

ll/!loc = sup|/(a)|. 

Here we consider obscrvables/functions /(cr) depending on large number of variables 
a(x), x £ An, such as coverage, surface roughness, correlations, etc., see for instance 
the examples in Section [5] Alternatively, we may consider obscrvablcs depending on 
infinitely many variables o~(x), x £ A = Z d , to stress the necessity of working with 
the infinite volume limit. 

Typically in KMC we need to compute expected values of such observables, that 
is quantities such as 

u(C,t) := E C [/(<*)] =^/W%«) , (2.5) 

cr 

conditioned on the initial data Co = £. By a straightforward calculation using (|2 .4[) 
we obtain that the observable (|2.5[) satisfies the initial value problem 

d t u((,t) = £u((,t) , u(C,0)=/(C), (2-6) 

where the operator £ : Cb(S) —> Cb{S) is known as the generator of the continous 
time Markov chain, [15], and in the case of (|2.ip it is 

Cf{a)=Y J <<r,a')[f{a')-f{a)]= £ £ c(x, W ; - /(a)] . (2.7) 

We then write (|2.5[) . as the the action of the Markov semi-group e tc associated with 
the generator C and the process {at}t>o, [H], on the observable / 

u(C,t)=E<[f(a t )] =e* £ /(C), (2.8) 

where denotes the expected value with respect to the law of the process {at}t>o 
conditioned on the initial configuration (. 

We define a difference operator S x f as an analogue of a derivative. Higher-order 
derivative analogues are defined in Section [5] when needed in the error analysis. We 
define a corresponding function space, which is necessary in order to set up the semi- 
group P = e tc when we consider the infinite lattice A = Z d or to obtain estimates 
which are independent of the system size N when considering the lattice A^r in Sec- 
tion [5] 

Definition 2.1. Let f £ Cb(S) then for any x e A N we define 
S x , u f(a) = f(o*< u ) - /(a) . 

We define the norm 

||/||l=£ || Wlloo 
X . uj 
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and the space of functions on S — £ N 

C^S) ={f E C b (S) | \\f\\i < Cf where C f is independent of N } . 

Similarly we define the space of functions on S = S A associated with the infinite 
lattice A = Z d 

C 1 (S) = {f £C b (S)\\\f\\ 1 <^}. 

Because of the estimates in Section \5\ see (|5.8p and (|5. 10[) in Theorem 15.61 we will 
employ spaces with higher discrete derivatives that will be defined in Section [5] On 
the infinite lattice A macroscopic observables are all / £ C 1 (<S). In the case of An, 
macroscopic observables are all / = /(cr) such that ||/||i is independent of the system 
size N; such typical examples are discussed in Section [5j 

Typically, the evolution of the particle system on the infinite lattice A = Z d is 
well-defined, as demonstrated in the next propositions. 

Proposition 2.2. For any f £ C 1 (S) we have that the series 

£/(*) = £ £ c(wM°*n-m], 

x£A uj^Sx 

converges uniformly and defines a function in Cb(S), provided swp x u a c(x, w; cr) < oo. 
Furthermore, 

II Cf Hoc < sup c(x,w;ct-)||/|[i . 



Proof. Follows directly from (|2 . T[) and the definition of C 1 (5). □ 

Proposition 2.3. Under the boundedness assumptions on the rates, the closure 

of the operator C defines a Markov generator for a Markov semigroup P = e tc , such 

that for f £ C^S), Pf 6 C^S) and 

||e"7||i< e rt ||/||i, 

where T is a constant depending on the rates c(x,u>;a). 
Proof. See [HJ Theorem 3.9, pp 27]. □ 

Clearly the same results hold for the finite lattice An and the corresponding 
high-dimensional configuration space 5, where all constants are independent of the 
size N. 

Examples. 

Adsorption/ Desorption for single species particles. In this case spins take values in 
a(x) £ S = {0, 1}, fl x = {x}, S x = {0, 1} and the update represents a spin flip at the 
site x, i.e., for z £ An 



o~(z) if z 7^ x, 

1 — o~(x) if z = x. 



Diffusion for single species particles. The state space for spins is a(x) eS = {0, 1}, 
= {y £ An I | a; — y\ = 1} includes all nearest neighbors of the site x to which a 
particle can move. Thus the new configuration obtained by updating 

the configuration a t ~o from the set of possible local configuration changes {0, 
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using the specific rule, also known as spin exchange, which involves changes at two 
sites x and y G fl x 

{a{z) iiz^x,y, 
a(x) i£z = y, 
a(y) if z = x. 

The transition rate is then written as c(x,Lo;a) = c(x,y;a). The resulting process 
{o~t}t>o defines dynamics with the total number of particles (X^gAn a ( x )) conserved, 
sometimes referred to as Kawasaki dynamics, [3]. 

Multicomponent reactions. Reactions that involve K species of particles are easily 
described by enlarging the spin space to £ = {0, 1, ... , K}. If the reactions occur 
only at a single site x, the local configuration space S x = £ and the update is indexed 
by k G £ with the rule 

I k if z = x. 

The rates c(x,w;cr) = c(x,k;o~) define probability of a transition ct(.t) to species 
k = 1, . . . , K or vacating a site, i.e., fc = 0, over 5t. 

Reactions involving particles with internal degrees of freedom. Typically a reaction 
involves particles with internal degrees of freedom, and in this case several neighboring 
lattice sites may be updated at the same time, corresponding to the degrees of freedom 
of the particles involved in the reaction. For example, in a case such as CO oxidation 
on a catalytic surface, j!6| . when only particles at a nearest- neighbor distance can 
react we set a(x) G £ = {0, 1, . . . Cl x = {y G A.N | | — 2/ 1 = 1 } and the set of 
local updates S x = £ n *. Such S x contains all possible reactions in a neighborhood 
of x. When reactions involve only pairs of species, the rates can be indexed by k, 
I G £, or equivalently S x — £ x S. Then the reaction rate c(x, uj; a) — c(x, y, k, 1; a) 
describes the probability per unit time of o~(x) k at the site x and cr(y) — > I at y, 
i.e., the updating mechanism 

{(j(z) j£z^x,y, 
k ifz = x, 
I if z = y, 

where \x — y\ = 1. 

3. Towards parallel kinetic Monte Carlo algorithms. In practice, the sam- 
ple paths {er t } t >o are constructed by the kinetic Monte Carlo algorithm, that is by 
simulating the embedded Markov chain defined by (|2.2[) and (|2.3p and advancing the 
tine by random time-steps from the exponential distribution. Implementations are 
based on the efficient calculation of transition probabilities, e.g., [3] for Ising models, 
known as a BKL Algorithm, and in [7] known as Stochastic Simulation Algorithm 
(SSA) for reaction systems. 

It is evident from formulas (|2.2|) and (|2.3|) . that KMC algorithms are inherently 
serial as updates are done at one site x G An at a time, while on the other hand 
(|2.2[) depends on information from the entire spatial domain Ajv- For these reasons 
it appears that KMC cannot be parallelized easily. However, Lubachevsky, in (T?] . 
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proposed an asynchronous approach for parallel KMC simulation in the context of 
Ising systems, in the sense that different processors simulate independently parts of 
the physical domain, while inconsistencies at the boundaries are corrected with a 
series of suitable rollbacks. This method relies on the uniformization of (|2.2[) ; thus 
the approach yields a null-event algorithm, |T4j, which includes rejected moves over 
the entire spatial sub-domain that corresponds to each processor, see also [9]. A 
modification in order to incorporate the BKL Algorithm was proposed in |17) . which 
was implemented and tested in [T^]. This is a still asynchronous algorithm, where 
BKL-type rejection- free simulations are carried out in the interior of each sub-domain 
(processor), while uniform rates are used at the boundaries, reducing rejections to 
just the boundary set. However, these asynchronous algorithms may still have a high 
number of rejections for boundary events and rollbacks, which considerably reduce 
the parallel efficiency, |22| . Advancing processors in time in a synchronous manner 
over a fixed time-window can provide a way to mitigate the excessive number of 
boundary inconsistencies between processors and ensuing rejections and rollbacks in 
earlier methods. Such synchronous parallel KMC algorithms were proposed in [5], 
|22j . 1 1 8 j . [19] . However, several costly global communications are required at each 
cycle between all processors whenever a boundary event occurs in any one of them, 
in order to avoid errors in processor communication, [19] . As we will discuss in the 
sequel, many of these issues with parallel KMC can be addressed by abandoning the 
earlier perspective on creating a parallel KMC algorithm with exactly the same rates 
c(x,u);a) in (|2.7|) as the serial algorithm. 

Indeed, in [T|, we adopted the approach of creating a parallel KMC algorithm 
which approximates the underlying continuous time Markov chain of the serial al- 
gorithm instead of reproducing its master equation exactly. We proposed a spatio- 
temporal decomposition for the Markov operator underlying the KMC algorithm into 
a hierarchy of operators corresponding to the processor architecture. Based on this op- 
erator decomposition we can formulate Fractional Step KMC Approximation schemes 
by employing the Trotter product formula. In turn these approximating schemes 
determine the Communication Schedule between processors through the sequential 
application of the operators in the decomposition, as well as the time step employed 
in the particular fractional step scheme. Earlier, in |23j the authors also proposed an 
approximate algorithm, in order to create a parallelization scheme for KMC. It was 
demonstrated in []j|J HO] > that boundary inconsistencies are resolved in a straightfor- 
ward fashion, while there is an absence of global communications. Finally, among the 
parallel algorithms tested in [19], the one in [23] had the highest parallel efficiency. 
In PQ, we demonstrated that the approximate algorithm in }23| is a special case of 
the Fractional Step Approximation schemes introduced in pQ. We also demonstrated, 
using the Random Trotter Theorem, |13j . that the algorithm in |23j is numerically 
consistent in the approximation limit, i.e., as the time step in the fractional step 
scheme converges to zero, it converges to a Markov Chain that has the same master 
equation and generator as the original serial KMC. Finally, the open source SPPARKS 
parallel Kinetic Monte Carlo simulator, [20], also relies on such Fractional Step ap- 
proximations. 

In this article, the convergence, reliability and efficiency of parallel algorithms, 
that fit the Fractional Step KMC approximation framework, are systematically ex- 
plored by rigorous numerical analysis which relics on controlled-error approximations 
in transient regimes relevant to the simulation of extended systems. 
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3.1. Fractional time step kinetic Monte Carlo algorithms. In [T] we pro- 
posed a class of parallel KMC algorithms that are based on operator splitting of the 
Markov generator C which is based on a geometric decomposition of the lattice A at. 

Definition 3.1. The lattice Ajv is decomposed into non- overlapping coarse cells 
C m , m — 1, . . . ,M such that, \C m \ = Q = q d , where d is the dimension, 

M 

A N ={JC m , C m nC„=0, m^n, N = MQ . (3.1) 

m— 1 

The range of interactions is defined as L = max x6( 7 m {diamri x }. For a coarse cell 
C m the closure of this set is 

Cm = {z S A N | \z - a: | < L,x G C m } . 

The boundary of C m is then defined as dC m = C m \ C rn . 

The closure C m thus includes all sites of C m and all "boundary" lattice sites dC m 
which are connected with sites in C' m through particle interactions in the updating 
mechanism, see Figure 13.11 In many models the value of the interaction range L is 
independent of x due to the translational invariance of the model. This geometric 
partitioning induces a decomposition of (|2.7|) 

M 

Cf(a) = ^nf(a) , C m f(a) = ^ E c ^ ^M^) - ■ ( 3 - 2 ) 

The generators C m define a new Markov process {cr™}i>o on the entire lattice Ajv- 

Remark 3.1. In many models such as in catalysis the interactions between 
particles are short-range, [HI 02], an d therefore the transition rates c(x, ui; a) depend 
on the configuration a only through a(x) and o~(y) with \x — y\ < L, where L is 
small (typically one). Similarly the new configuration a x ^ involves changes only at 
the sites in this neighborhood. Thus the generator C m updates the lattice sites at 
most in the set C m = {z \ \x — z\ <L,ie C m }. Consequently the processes {a" x } t >o 
and {cr™ }oo corresponding to C m and C m ' are independent provided C m PI C m ' = 0. 
The operator decomposition yields an algorithm suitable for parallel implementation, 
in particular, in the case of short-range interactions when the communication overhead 
can be handled efficiently: if the lattice A^ is partitioned into subsets C m such that 
diamC m > L, we can group the sets {C m }^f =1 so that there is no interaction between 
sites in C rn that belong to the same group. For the sake of simplicity we assume 
that the lattice is divided into two sub-lattices described by the index sets I 1 and I 2 
(black/red in each block in Fig. 13. 1() . which in turn induce a corresponding splitting 
of the generator: 

A N = Alf U A^ := |J C m U |J C m and 

C = d + C 2 := ]T Ci, m + J] £ 2, m ■ (3.3) 

The decomposition p.3j) has key consequences for simulating the process {crt}t>o 
in parallel, as well as formulating different related algorithms. The processes {<7™}t>o 
corresponding to the generators L\. m are mutually independent for different m e 
I 1 , and thus can be simulated in parallel. Similarly we can handle the processes 
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Fig. 3.1. Lattice •partitioning in i3. 31 ). 
interior boundary of d, see Figure \57J\ 



Note that later we use the notation Cf to denote the 



belonging to the group indexed by X 2 . However, there is still local communica- 
tion/synchronization between these two groups as there is non-empty overlap between 
the groups due to interactions and updates in the sets C m f) C m > when m £ X 1 and 
m! G X 2 and the cells are within the interaction range L. Mathematically, we can 
describe all that through a fractional step approximation of the Markov semigroup 
P = e tc of the process {<Tt}t>o- The operator splitting or equivalcntly the fractional 
step approximation can be also viewed as an alternating dimension approximation 
since we solve the evolution of u(a, t) given as solution of (|2.6j) by alternating between 
evolution of a's in the dimensions corresponding to X 1 and X 2 . 

Indeed, the key tool for our analysis are different versions of the Trotter formula, 

nana, 



lim 



-£ig — £2 



(3.4) 



when applied to the operator £ = L\ + £2 in (|3.3[) . Thus to reach a time T we define 
a time step At = h = — for a fixed value of n and alternate the evolution by C\ and 
£2, giving rise to the Lie splitting approximation for n» 1: 



Pl := 



r e At£i e At£ 2 



where At = =- . 



(3.5) 



To develop a parallelizable scheme we use the fact that the action of the operator £\ 
(and similarly of £2) can be distributed onto independent processing units, indexed 
by m in 



„At£i 



n 

mel 1 



mGl 2 



At£ 2 



Analogously we have the Strang splitting scheme 



e TC *Ps 



e ^ £1 e At£ 2 e ^ £1 



where At = — . 



(3.6) 
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From now on, for the notational convenience, we shall also use h to symbolize At. 

While operator splitting has been exploited in many classical numerical methods, 
e.g., [5], in our context it offers a rigorous framework for extending simple (determin- 
istic) alternating strategies associated with, for example, traditional Lie or Strang 
splittings to more elaborate and randomized Processor Communication Schedules, we 
refer to Section [5] for a complete discussion. 

We characterize the FS-KMC (Fractional Step KMC) algorithm Q3.5P as partially 
asynchronous since there is no processor communication during the period h. Further- 
more, at every h we have only local synchronization between processors, i.e., between 
the sets C m n C m i when m G T 1 and m! G I 2 . Hence, the bigger the allowable h 
in (|3.5[) or in (|3.6[) the less processor communication we have, in which case the er- 
ror in the approximation (|3.5|) or (|3.6[) worsens. This balance between accuracy and 
processor communication in algorithms is one of the themes of this article. 

4. Local and global error analysis. The FS-KMC algorithm approximates 
the evolution of observables u(o~, t) given by the original semigroup P. We present 
an error analysis which focuses on classes of observables such as (|2.5[) instead on 
estimating an approximation of the probability distribution of the process solving 
(12.41) . This perspective is also relevant to practical simulations, where the estimated 
quantity is linked to specific observables, and is simulated by the FS-KMC algorithm. 

To understand the error of this approximation we first analyze the error for the 
two cases of deterministic PCS: the Lie splitting defines a new semigroup (|3.5[) that 
we denote Pl and similarly P$ denotes the semigroup (|3.6|) obtained by the Strang 
splitting. The local error analysis can be treated in a similar way as it is done for 
the finite dimensional case when working on the lattice An by using the property 
proved in [TU]. The estimates for local and global error follow standard steps and 
are presented next for completeness. However, for macroscopic observables that typi- 
cally arise in the simulation of extended KMC systems, we prove estimates which are 
system-size independent in Section [5] Finally, in Section [7] we present, as a comple- 
menting theoretical perspective, the same estimates on the infinite lattice A = Z d , in 
which case the involved generators are necessarily unbounded. 

Lemma 4.1. Let C be the generator of a strongly continuous contraction semi- 
group {e tc }t>o on the Banach space GV Then the operators 

m—l 

V m (tC) = e tc - ]T ^C k , me N+ (4.1) 

fc=0 

satisfy the bound 

||P m (t£Hloo < ^ll^llco, VveC b (4.2) 
Proof, see Jahnke, [10]. □ 

Lemma 4.2 (Local Error). Let Pl(*) and Ps{t) be the schemes \3. 5\) and 13. 6]) 
associated with the Lie and Strang splittings respectively, and let u(h) = P{h)f be the 
solution of i2.6\) . Then the local error for the Lie splitting is 

\\P L (h)f-u(h)\\ 00 <c 1 \\[C 1 ,C2}f\\ooh 2 +c 2 £ H^^r/lloc/i 3 , (4.3) 

|m|=3 
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and for the Strang splitting scheme 

|| P s (h)f - u(h) lU < c 3 || [A, [A, A]]/ - 2[C 2 , [A, A]]/ \\ooh 3 

+ c 4 IIA^a^A™ 3 /]!^ 4 ( 4 - 4 ) 

|m|=4 

where [A, A] = A A - AA denotes the commutator of A and £2 andci, i = 1, ...,4 
are positive constants with a < 1. 

Proof. Using Lemma 14.11 the proof follows the standard finite dimensional ap- 
proach based on the expansion of the operator exponential. We present the calcula- 
tions here for the sake of completeness. In order to simplify the notation, we introduce 

if k < N , 
V k {hC 2 ) if k = N > 0, 
e h£ 2 if fc = TV = . 

(4.5) 

Now the semigroup for the Lie splitting, at t = h, can be written as 
e hU e hC 2f= £ „„ Jlnb^lnf 

i+j<3 

= (l + /j(A + A) + y(A+A) 2 )/ 

+ /i 2 [A,A]/ + J! ai,3-iWbj, a {h)f . 

Comparing with 

e KL 1+ C 2 ) f = (j + h[Ci + ^ + + / + V3{h{Cl + C2))f i 

we get the estimate for the local error 

|| e ^ e ^/ - e KCi+to)f | )oo < ft a|| [A,^]/ 1^ 

+ ||P 3 (MA + A))/||oo + || ]T O'.s-iW&i.aW/lloo. 

The second term in the above inequality is bounded by Lemma |4. II and the third term 
is bounded by 

|| Y ai,3-i{h)b jia (h)f Hoc < c/i 3 (|| C\f IU+II £ 2 A/ IU+II A^/ IU + II A 3 / ||oo) , 

i+j=3 

which follows from the definitions of a k and b k . The last step completes the proof for 
the local error in the Lie case. For the Strang scheme the proof follows the same idea, 



' ^ A if k < N , 
a k ,N(h) = { V k {hd) if k = N > , b k , N (h) 
if k = N = , 



J1C1 
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we only have to take one more term in the expansion, 

i+j+k<A 

= (l + h(d + C 2 ) + y(£i + C 2 ) 2 + y (A + A) 3 )/ 

+ ^([A,[A,A]]-2[A,[A,A]])/ 
+ a k ,4-i-j(~)b jA -i(h)a iA (~)f . 

i+j+k=4 

Comparing with 

e h{C 1+ C 2 ) f = ^ + h{Ci + £a) + y(A + C 2 f 

+ y (A + A) 3 ) / + 2> 4 (MA + A))/ , 
the estimate for the local error follows 

|| e ±C le h£ 2e iC lf _ e h(C 1+ C 2 ) f | !oo < ^3|| [£,,£2]]/ _ 2 [£ 2 , [£ 2 , A]]/ Hoc 

+ || ^ a M ^- J (-)6 J -, 4 -i(/i)a i ,4(2)/llc>o + ||X'4(/i(£i + £ 2 ))/|| 00 . 
The second term is bounded by Lemma 14. II and the third term is bounded by 

|| ]T a fc ,4-i- J -(^)6 A 4-iWOi 1 4(|)/ Hoc < C4^ 4 2 II ^T^T^Tf Hoc , 
i+j+k—4 |m|— 4 

which again follows from (|4.5[) . □ 

After establishing the local truncation error it is straightforward to obtain the 
global error estimate. 

Theorem 4.3 (Global error). Let Ph{t) and Ps(t) be the the schemes h3. 5\) and 
h3. 6V associated with the Lie and Strang splittings respectively and let u(t n ) = P(t n )f 
be the exact solution of \2.6\) . Then the global error at the time T = t n — nh, for the 
Lie splitting is bounded by 

||PL(tn)«(0)-«(t„)||oo<Ci max || [A, C 2 ]u{t k ) W^h + K L (u)h 2 , (4.6) 

k— 0,...,n 

where the remainder is given by 

K L (u)=K L (u;n,h) = C 2 max V || A^ATM^) IU . (4.7) 

k— 0,...,n ^ — ' 

|ro|=3 

and for the Strang scheme 
|| Ps(tn)u(0) - u(t n ) Hoc <C 3 max || ( [A, [A, A]] - 2[A, [A, A]]W fe ) |U^ 2 

fc=0,...,7i \ / 

(4.8) 

+ft s (^ 3 , 
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where 

K s (u)=1l s (u;n,h) = C 4 max V || L™C^£? 3 u{t k ) ^ , (4.9) 

k— 0,...,n ^ — ' 

m|=4 

and Ci, (72,6*3 and C4 are constants, depending only on T. 
Proof. It can be shown by induction that 

n-l 

e„ = P n (h)u(0) - u{t n ) = P\h) (P(h) - P(») p( n - k -V(h)u(0) . 

where P denotes cither Pj, or P5. By the assumptions, the operators C\ and £2 
generate strongly continuous contraction semigroups and thus || P h ||oo < 1, the global 
error is bounded by 

n-i 

|| & n || oo 

fe=0 

<n max || fp(/i) - P(h)) u(t h ) ||oo • 

k— 0,...,n \ / 

Using Lemma T4. 21 for P = Pj, and P = Ps, to estimate the local error and the fact 
that nh = T we obtain the estimates (|4.6|) and (|4.8p for the Lie and the Strang scheme 
respectively. 
□ 

5. Estimates for macroscopic observables. In Theorem 14.31 we have shown 
that the proposed splitting schemes are convergent as the time step h tends to zero. 
The main idea of the proposed scheme is to control an error for observables, in other 
words we estimate the weak error by analyzing solutions of (|2.6[) . If we restrict the 
initial data of the problem (|2.6[) to a special class of functions, then it is possible 
to show that the error terms are independent of the size of the lattice, N , It turns 
out that this is a wide class of function containing some of the most common observ- 
ables in KMC simulations, such as mean coverage or spatial correlations, we refer to 
Section 15.11 

In order to simplify the notation we suppress the dependence of the discrete 
derivative operator 5 X ^ on u> in Definition 12.11 

Definition 5.1. For x = (xx, . . . , x m ) £ we introduce the notation 

S x f(o-) =S Xl ... S Xm f(a) = 6 xl ... Xm f(a) , 

and we refer to it as the discrete derivative of f with respect to x. For example if 
x = (x, y) then 

S xy f(a) = 6J y f(a) = fro**) - f(a x ) - /(a*) + /(a) . 

Definition 5.2. Let x = (xi,...,x m ) e and f e Cb(S). Then we define the 
norm 

11/11™ = E ••• E UMioo, 
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and the function space 

tn 

C m (S) = {/ G C b (S) | 11/11* < Of where C f is independent of N} , Vm G N . 

fc=i 

We refer to elements of C m {S) as macroscopic observables and we will discuss ex- 
amples in Section [5. II We now present the main theorem of this paper, showing that 
for such macroscopic observables, or equivalently under smoothness conditions on the 
initial data, the global error estimates for the Lie and the Strang schemes are inde- 
pendent of the dimension of the system. The proof of this theorem is contained in 
the next two subsections. 

Theorem 5.3. (a) Let u(t) be the solution of (2.6\) with u(0) = f G C 3 (S).Then 
for the global error estimate of Theorem ^. 3\ on the Lie scheme \3.5\) we have 

||Pi(*«M0)-u(i„)||oc <Ci max || [A, C 2 ]u{t k ) W^h + K L (u)h 2 , 

fc=0,...,7Z 

where 

II [C 1 ,C 2 ]u{t k )\\ 00 <C 

and 

n L {u) <c, 



where both constants C and C are independent of the system size N . Moreover, if 
w(0) G C 4 (tS) then for the global error of the Strang scheme 

[A, [A, A]] - 2[A, [A, £i]])«(t*) Hoc < C, 

K s (u)<C, 

where the constants C and C are independent of the system size N . 

(b) Many macroscopic observables u(0) = / are not just in C 3 (iS) but also satisfy a 

local bound such as 

C 

max ||(5 2 u(0, • )||oo + max || S xy u(0, •) + max || 5 xyz u(0, •) < — . (5.1) 

zEA N x,y£A N x,y,z£A N 1\ 

Then the bounds for the commutators become 

Td+l 

||[A,AMV)lloo<<? , (5.2) 



j2d+l 

[A, [A, A]] - 2[A, [A, A]] )u{t k ) Hoc < C , (5.3) 



where = Q = q d and the constant C is independent of N . The parameters L, M, 
N , q are defined in Definition \3.1[ and d is the dimension of the lattice A^r C A = Z d . 

The proof of this theorem is given in Section 15.41 while the supporting results 
are proved earlier in Sections 15.21 and 15.31 Next, we discuss typical examples of 
macroscopic observables / which are used in KMC simulations and also satisfy the 
assumptions of Theorem [ 
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5.1. Examples of observables. There is a wide class of macroscopic observable 
functions in C m (S), that satisfy 

6 x f(<r) ■= jz^ix + kx),--- ,a(x + k e )) .h £ A N ,Vx £ A N , (5.4) 

A class of functions that satisfies (|5.4[) , or more generally (|5.1[) . includes the coverage, 
spatial correlations, Hamiltonians and more generally observables of the type 

= n E U ( a (y + fci )' • • • ' a (y + > ki e An ■ 

yeA N 

These functions have the property that their discrete derivatives depend only on a 
fixed number of points on the lattice that does not scale with N. In this section we 
will show that this class of function belong in C m (S), Mm £ N + . 

Example 5.1 (Coverage). Let /(cr) = a = J2 x <=a n a ( x )i tne observable that 
measures the mean coverage of the lattice A jv ■ Then 

S x f(cr) = ^(a x (x)-a(x)), 
and in the case cr(x) £ {0, 1} it takes the simple form 

The local average over a percentage of the domain, defined as 

x£AcA N 

is also in the same class. 

Example 5.2 (Spatial correlations). Let /(cr; k) = Y^xeA cr(x)a(x + k), the 
mean spatial correlation of length k. Then, when o~{x) £ {0, 1} it takes the form 

S x f(o-) = ^(1 - 2a(x)) (a(x + k) + a(x - k)) . 

In these examples it is obvious that / £ C 1 (5). To such functions we can apply 
Lemma [5.51 and easily conclude that they belong to C m (S) for m < mo, where mo 
depends on the form of the observable. 

Example 5.3. Let / be an observable of type |5.^[) with l= \ and ki = 0, then 



SJ v f(a) = S x ^(a(y)) = ^(a x (y)) - ^(a(y)) = 0, \x -y\ > 1 . 

An analogous result holds when I > 1 and ki ^ with \x — y\ > c(£), where the 
constant depends on I but not on N . 

Finally, there are macroscopic observables that are not of the type (|5.4p but still 
satisfy ([5TTj) 

Example 5.4 (Variance). Let /(cr) = j/Y^xga n ( (T ( x ) ~ = & - & 2 . Then 

1 / , „/ 2a(x) — 1\ 
S x f(a) = - (1 - 2<7(s)) (l - 2a + -A2 j . 

It is easy to verify that variance is in C 2 (S) and satisfies (|5.1I) . 
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5.2. Bounds on the remainder. In order to establish that the remainders 
Hl(u), (|4.7[> . or Tlg(u), (|4.9|) . Theorem l4.3[ are bounded by constants independent of 
N we derive estimates for powers of the operators Ci, £2 and their compositions such 
as C\C 2 . The idea for such estimates is an easy extension of estimates on C 2 acting 
on the solution of p.6[) . which we present next. First, we prove that C 2 u is bounded 
by the sum of first and second derivatives of u. 

Lemma 5.4. Let u be the solution of equation 12. 6]) . Then for the operator C 2 
the following bound holds 

||£ 2 w(i,-)||oo <Cl ll^"(*.-)lloo+C2 HW*(V)l|oo 

x£A N x,y€LA N 

= Pi|[«(t,-)lli + C2||«(t,-)l|a- (5-5) 
Proof. By a straightforward calculation 

£ 2 u(t,a)= ^2 c(x,a)c(y,a x )6 xy u(t,a) - ^ c(x, a)6 m c(y, a)5 y u(t, a) , 

x,y£A N x,y£A N 

and by taking norms on both sides 

IKV)l|oo < || ^2 c(x,-)5 y u(t,-)\\ 00 + c 2 Y || S xy u(t, -)|| OO 

x^An \x — y\<L x,y£AN 

<Cl X! ll*tfW(*)-)l|oo+C2 ^ ll^2/ U (*'')l|oo, 
x£An x,uGAn 

where the first inequality follows from the fact that 8 x c(y,cr) = when \x — y\ > L, 
see Lemma l5.5[ where we show that the derivatives of the rate functions have compact 
support that depends only on the length of the interaction L. □ 

Lemma 5.5. Let c be a rate function with interactions of range L 

c(a, a) = c(a(a — L), , . . , a{a + L)) , a e Ajv, 

then 

5 x c(a, a) = 0, Vi G A^v with \x — a\ > L , 

and 

5 xy c(a, a) = 0, Vx, y £ Ajv with \x — y\ > 2L + 1 . 
Moreover, for all higher derivatives holds that 

n 

S Xl S X2 ■ ■ ■ S Xn f(a) = Y[ SxJ(o-) = , \xi - Xj\ > 2L + 1 , i ^ j . 

fc=i 

Proof. For the first discrete derivative it is sufficient to observe that if x ^ y then 
o~ v (x) = o~{x). Thus when a has distance from x greater than L the rate function 
c(a,o~) is equal to c(a, o x ) and the first derivative is zero. 

For the second derivative, based on the calculation for the first derivative, we 
have 



6 X [ 6 y c(a, a) ) = , \y - a\ > L , 
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or, if we interchange x and y, 

5 y (s x c(a, a)j = , \x — a\ > L . 

Finally, the second derivative is always zero when \x — y\ > 2L + 1. 

For the general case, the proof follows from the fact that 8 x 5 y c{a 1 a) = 5 y 5 x c(a, a) 
and from the following observation 

n 

J\ S Xk (s xi S Xj c(a,a) S j = 0, \xi - Xj \ > 2L + 1 , i^j, 
fe=i 

which is true by the result for the second derivative. □ 

Proposition 5.6. Let u(t,a) be the solution of the equation 12. 6}) with initial 
data in C 2 (S) . Then the operator C? satisfies the bounds, 

||£VV)lloo <C, (5.6) 



||u(t, Olli + IK*, - )lla < CilKO, • )||i + C 2 \\u(0, ■ , 

where C,C\ and C% are constants independent of N . 

Proof. We will bound the right hand side of the equation (|5.5[) thus we need 
estimates on the first and the second derivatives of u. For the sake of brevity we use a 
vectorial notation Cf = c(er) • V a f(a) = J2 X c ( x i °~)dxf(o~)- The governing equations 
for u, v\ = S x u, V2 = S x u and w = 6 x S y u are 

dtu = c(ct) • VcrU 

d t vi = c(a) ■ V a vi + 6 x c(a) ■ \7 a u(a x ) 
d t v 2 = c(a) • Vo-wa + Syc(a) ■ V a u(a v ) 

d t w = c(o-) • V a w + 5 y c(a) ■ \7 a vi(a y ) + 5 x c(a) ■ V<rV 2 (o- x ) + S xy c(a) ■ \7 a u(a xy ) . 
First we bound the first derivative writing the solution for v(t,o~) 

5 x u(t,a) = e ct u(0,a)+ f e^~ s ^ c ^ S x c(y, a)6 y u(s, a x ) ds . (5.7) 

\y-x\<N 

By taking the norms and summing over all x G An 

H$rU(V)||oo< ll*x«(0,-)||oo + Ci / \\SyU(s,-) \\ocds. 

xGA N x£A n x£A N \y— X \<L 



By setting 



we obtain 



V (t) = \\u{t,-)\\ 1 = J2 II lloo, (5-8) 



i6A f 



ip(t) < <p(0)+c! [ ip(s)ds. 
Jo 
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Similarly, for the second derivatives we have, by using Lemma 15.51 

d t S xy u(t,<T) = CS xy u(t,cr) + S y c(z,a)5 xz u(t,a v ) + 8 x c(z,<r)dy Z u(t,a x ) 

\z-y\<L \z-x\<L 

+ X] S xyc(z ) a)S z u(t ) a xy )xc 2L {x,y) , 

\z — x\ <L 
\z-y\<L 

where xc 2 l i s t ne characteristic function and Cil = {(%, y) G | \x — y\ < 2L}. The 
solution of the above equation is expressed as 



6 xy u(t, a) = e tc 5 xy u(0, a) + / e^ c 



Syc(z,a)5 xz u(s,cr y ) + ^ S x c(z 1 a)d yz u(s,a x ) 

\z-y\<L 



-\z — x\ <L 



^2 $xyc(z,a)d z u(s,(T xy )xc2L( x ,y) 



I Z — X | < L 

\z-y\<L 



ds . 



Thus, by using the contraction property of the semigroup and the fact that the discrete 
derivatives of the rates are bounded functions, we have the estimate 

II S xy u(t, •) Hoc < || S xy u(0, •) Hoc + Ci / ^2 II 8 X zu(s, •) Hoods 

"' |*-a;|<L 



C2 y 5Z II <W4 S > lloo ds . 59 . 
c 3 / \\S z u(s,-)\\ 00 xc 2L (x,y)ds. 

JO I ..l^r 



|z-x|<L 

z-y|<L 



By summing over all x, y € Ajv and setting 



we obtain 



#(t) = \\ u (t,-)\\ 2 = ]T \\s xv u(t,-) iu, 



tf(t) < + c 2 / tf(s)ds + c 3 / yj(s) ds 



(5.10) 



where both c 2 and c 3 depend on L but not on N. However, from Lemma [A. II (see 
Appendix) we have 

<p(t) <ci^(0) = ci||u(0,.)||i<G J 

where the last inequality follows from the assumption that u(0,a) € C 1 (iS). Further- 
more, from Lemma lA. II 

m < S 2 i?(0) + c 3 ^(0) , 

and the second term is bounded from the previous argument whereas the first term 
is equal to 



f?(0) = ||«(0,-)||2 <c, 
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which is true because the initial data are in C 2 (S) . Finally, we obtain from Lcmma [5.41 

\\c 2 u(t, -)\\oo <cMt) + c 2 m <c. 

□ 

Remark 5.1. The same result can be obtained if we notice that the function 
v(t, a) = £ 2 u(t,cr) satisfies the equation (|2.6|) . Then the solution can be written as 
v(t, a) — e tc v(0, a) and by taking the norm on both sides we get the estimate 

||£ 2 ^,-)Hoo < ||e t£2 U (0,-)||oo < ||«(0,-)||oo <C, 

where the second inequality follows from the fact that C 2 generates a contraction 
semigroup. However, in order to get bounds for quantities like C1C2U, it is sufficient 
to observe from Lemma 15.41 that 

||£i£ 2 1t(t, -)||oo < ci ^2 \\S xy u(t,-)\\ 00 +c 2 ^ ll^x«(*,-)||oo 

<KV)||i + IKv)l| 2 

and the norms on the right hand side are bounded from Proposition 15.61 

Our last goal for this section is to prove that the remainders in the Lie and the 
Strang scheme, (|4.7[) and (|4.9[) respectively, are independent of the size of the lattice. 
To achieve this, we first have to bound third and fourth powers of combinations of 
the operators L\ and £2 arising in (|4.7[) and (|4.9|) . Then, as in Remark 15.11 using 
a more general form of Lemma 15.41 it is easy to prove that all relevant combinations 
of £1 and £2 are also bounded by constants independent of N. We will present the 
general idea of the proof, rather than showing all the technical details that anyhow 
follow the same idea as in Proposition 15.61 First we give a definition of the discrete 
derivatives that generalizes Definition 15. II 

Definition 5.7. Let x = (xi,..., x m ) € and Vfc e N , k < m define a e A^, 
the k- dimensional multi-index of k-tuples of x. As in Definition \2.1\ the discrete 
derivative with respect to a is 

6 a f(o) = S ai ... ak f(a) , 

and we define S— a f(a) the derivative with respect to all variables inx= (x\, . . . ,x m ) 
that are not contained in a. 

Using this definition we are able to write a general form of the governing equation 
for the m-th discrete derivative, 

d t 6 x u(t,a)= 2^ S a c(a) ■ V a 5- a u{t,o a ) 

0<\a\<m 

= CS x ii(t, a) + 5 ^c{ct) ■ V a S- a u{t, a a ) , (5.11) 

1< \a\ <m 

where the prime in the summation symbol means that we sum over tuples without 
distinguishing the order of the variables, e.g., S xy = S yx . Using this representation 
and the Remark IA.1I we can apply the same idea as in Proposition 15.61 and prove 
bounds for the operator C k , k e N + , with initial data in C k (S). 



20 



G. Arampatzis, M. A. Katsoulakis, P. Plechac 



5.3. Bounds on the commutators. The constants in the local error estimate 
derived in Lemma 14.21 involve bounds on the commutators of the splitting operators 
£i and £2. We prove that these commutators are bounded operators on the spaces 
C m (iS), independently of the system size N. 

The error analysis quantifies the intuitive link of the approximation error to the 
commutator [£i,£2] of the operators £1 and £2. The commutator is directly re- 
lated to the geometric decomposition and the range of particle interactions. In order 
to demonstrate this relation more specifically we discuss an example of Ising-type 
interacting system in which the events (updates) occur only at a single site x £ A^r. 

The error estimates in Lemma T4. 2 1 link the local error to the commutator of the 
operators £1 and £2- Li principle the commutator can be computed explicitly in 
terms of the rates c(x,lu;o~) although general formulae become too complicated and 
impractical. Therefore we give an example for a specific example of single site events, 
i.e., ui = {x}. The example also demonstrates a procedure that is used for more 
involved cases. First we evaluate the commutators associated with the decomposition 
of the lattice into disjoint sub-lattices (Definition 13.11) . 

Lemma 5.8. Let C\, £2 be two operators defined by 

df(a) = £ - /(*)] . and ^f(a) = £ c(x,a)[f(a x ) - f(a)] , 

xeCi xec 2 

and C\,C 2 C An with dist(Ci,C2) > L. Then C\ and £2 commute, i.e., 

[£i,£ 2 ] =0. 



Proof. The proof follows from the straightforward calculation based on the fact 
that c(x,a v ) = c(x,a) when x £ C\ and y 6 C2 or vice versa and f{cr xy ) = f(o- yx ). 
By a direct calculation we get 



£i£ 2 /(<t)= ]T c{x,a)[c 2 f{a x )~C 2 f{o) 



16C1 



ieCi 

= E E <w)<y,cr)(f{o x v)-f{o- x )-f{o-v) + f{o-\ 

yGC 2 16C1 

= E C (J/'0(E c(x,a«)[/(a^)-/K)]- 2 c(x,a)[f(a x )-f(aj 
yec 2 xed 

= J2 c{ y ,a){Cxf{a y )-Cxf(a, 

yec 2 
= C 2 C 1 f(a). 

□ 

Lemma 5.9. Let C\ and C 2 be such that C t = C° U Cf , where G° := {x <E 
Ci I dist (x, {Ci) c ) > L}, where A c is the complement of set A. With further decompo- 
sition C° = C°° + Cf where C°° := {x 6 Q dist (x, {C l ) c ) > 2L} (see Figure\5l\). 
Let Ci = C° + Cf and C° = C°° + £° a , i = 1, 2 be the corresponding decomposition of 
the generator C, then 

= [£*,£$] 
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and 

[£ 1; [A, C 2 \] = [[£f , Cf],Ci] + [£?, [£?, £f]] . 

Proof. The proof of the first statement follows directly from Lemma T5.8I bv ob- 
serving that dist (Cf , Cf ) = 2L and 

dist (CV , C 2 a ) = dist(Cf , C* 2 °) = I . 

For the second statement, using the same lemma, we compute 

[A, [A, £2]] = [£?,£?£a] - >Cf >Cf ] + [£?, [£?,£f]] . 

The first term on the right hand side can be further simplified 

r no fd rd\ po rd rd rd rd no ro rd rd rd ro rd 

l~l ! *-l *-2 J — Lj 2 *"1 *-2 *-l ~ / -1' £ -1 *-2 • £ -l-'-l''-2 

= = [£r + £f,£?]4 

r /~o<9 /*<9] rd 

— l J ~l 7 / -lJ z -2 ' 

where in the second equation we used the fact that £^£1 = ^1^-2 an d m the last 
equation [£? ,£f] = 0. The same procedure leads to simplifying the second term but 
the third cannot be simplified further. Combining all these steps we obtain the result 
of the proposition. □ 
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Fig. 5.1. Sub-lattice partitioning. Note that we use the notation dCi to denote C\C, see also 
Figure\3j\ 

The estimation of the commutator in Theorem 15.31 requires local estimates on 
the first and second discrete derivatives of the solution to the backward Kolmogorov 
equation by the discrete derivatives of the initial data 

Lemma 5.10. The solution of the equation 

d t u = £u, ie(0,T], u(0,ct) = f{a) , (5.12) 

satisfies the bounds 

max \\S x u(t, ■ )||oo < C max H^ufO, • )||oo (5.13) 
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and 

max \\ 5 xy u(t, ■) \\oo <C[ max \\5 x u(0,-)\\oo+ max || 8 xy u(0, •) ||oo] , (5.14) 

x,y£A N x£A N x,y£A N 

where C is a constant independent of N, however, it may depend exponentially on t. 
Proof. Using (|5.7[) and Lemma |5.5[ we have 

||^«(*.-)l|oo<||*xU(0,.)||oo + O(l) / \\S X U(S,-) \\oods 

Jo 

+ I E H<5«tt( Sj .)||ooA». (5.15) 

\m-y\<L 

Here the symbol O is asymptotic in the size of the system N — > oo. Setting j(t) = 
max KeAjv \\S x u(t, ■ )\\oo we have 

\\S x u(t, ^Woo < 7(0) + 0(1) / j(s)ds + 0(y)L f 7(«)d«, 

Jo L Jo 

or 

7(<) < 7(0) + 0(1) / 7(s)ds. 

Jo 

Applying Gronwall's inequality we conclude the proof and obtain the bound 

l{t) < e c * 7 (0) . 

The inequality (|5.14[) follows similarly from (|5.9[) and from Lemma [A. II □ 

The commutator, as shown in Lemma 15 .91 is a localized quantity that depends 
only on the boundary sites of the decomposed sub-lattices. Thus the localized estimate 
in Lemma 15.101 gives us a tool in order to reveal the scaling of the commutator when 
acting on macroscopic observables. 

5.4. Proof of Theorem 15.31 By Lemma T5.91 the commutator can be written 
as [Cf^C®], which due to Lemma \5. 81 is expanded to 

[Ci,£%]u(t,(r) = ^2 c 1 (x,o-)c 2 (y,o- x )S y u(a x ,t) - c 1 (x,a)c 2 (y,cr)Syu(o-,t) 
zeA?,2/eA§ 

\x-y\<L 

- ci (x, cr y )ca (y, a)S x u(a x , t) + ci (x, a)c 2 (y, a)S x u(a, t) . 
On the other hand, by a straightforward calculation, we have 

CfC 2 u(t,a) = ^jT ci(x,a)c 2 (y,<T x )S xy u(t,a) 

\x— y | <L 

- ^ c 1 (x,o-)S x c 2 (y,o-)5 y u(t,a) . 

zeA?,?/GAf 
\x—y\<L 

Taking norms on both sides similarly to Lemma |5"^T1 and using the fact that the rates 
are bounded functions on Ajy x S, 

||[A,/:aM*,OI|oo<C ll*«*«(*.-)lloo + ll<yv«(*.-)l|oo<C, (5.16) 

x£Af,yeA% 
\x-y\<L 
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where the second inequality follows from Proposition 15.61 using the fact that the 
initial data are macroscopic observables, i.e., belong to C 2 (S). Similarly, we obtain 
the commutator estimate for the Strang scheme. 

Next, we turn our attention to (|5.2[) . Many observables are in C 2 (S), but also 
satisfy the local bound (|5.1|) as one can see in Section [531 Under this assumption, we 
obtain from (|5.16j) the bound for the commutator 

||[A,A>MV)||oc <C Yl ll*«V«(*.-)Hoo + ||«»«( f .-)lloo 

xeAf ,3/eAf 

x — y\<L 

<C[ max ||5 XJ/ it(0,-)||oo + max 11^14(0, OHool Y] 1, (5-17) 

L x,y£A N y&A N ' 

xGAf.ySAf 
x — y\<L 

where the second inequality follows from Lemma [5.101 Using the fact that the initial 
data belong to C 2 {S) and satisfy (|5.1|) . as well as that |C^J = c(d)Lq d ^ 1 , where d is 
the dimension, we deduce that 

||[A,£ 2 MV)l|oo<§ £ 1 < ^ x M x c(d)Lg d - 1 x L d = C— , (5.18) 

x£A° ,j/GAf 
\x-y\<L 

where we used the fact that ^ = Q = q d . We note that for more general, non-square 
lattices, the estimate is modified accordingly as the structure of neighbors in the 
calculation of |C^J will evidently change. Finally, the proof of (|5.3[) follows along the 
same lines, noting that the the summation in (|5.18[) is now replaced by summations 
such as 

£ 1 < M x cia^jLq' 1 - 1 xL d xL d . 

xeAf,yeA%,zeA? 
\x — y\< 1 L,\x — z\< 1 L 

6. Processor communication and error analysis. In this Section we ex- 
amine the balance between accuracy and processor communication in the parallel 
Fractional Step KMC algorithms. Our analysis is based on the local and global error 
analysis tools we have developed in this article. 

A key feature of the fractional step methods is what we define as the Processor 
Communication Schedule (PCS), which dictates the order with which the hierarchy 
of operators in (|3.3[) are applied and for how long. For instance, for the Lie scheme 
(|3.5[) the processors corresponding to C\ (resp. C2) do not communicate, hence the 
processor communication within the algorithm occurs only each time we have to apply 
e At£i or e At£ 2 _ p or ting reason, we characterize the FS-KMC algorithms (|3.5|) . (|3.6|) 
as partially asynchronous since there is no processor communication during the period 
At. Furthermore, at every At we have only local synchronization between processors, 
i.e., between the sets C m (~l C m i when m G X 1 and m' G I 2 . Hence, the bigger the 
allowable At in (|3.5[) or in (J376J) the less processor communication we have, in which 
case the error in the approximation ()3.5|) or (|3.6|) worsens. 

In both schemes p.5[) . and (|3.6[) . the communication schedule is fully determin- 
istic, relying on the Trotter Theorem. On the other hand, we can construct general 
randomized PCS based on the Random Trotter Product Theorem, [13] . Indeed, the 
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sub-lattice parallclization algorithm for KMC, introduced in [53], is a particular ex- 
ample of a fractional step algorithm with stochastic PCS. In [53JHD] each sub-lattice is 
selected at random, independently and advanced by KMC over a fixed time window 
At, subsequently a new random selection is made and again the sub-lattice is ad- 
vanced by At, etc. This algorithm is easily recast as a fractional step approximation, 

DP- 

Here we compare the deterministic and randomized PCS from the point of view 
of processor communication and error analysis: we specify the same error tolerance 
TOL for all PCS, which by means of our error analysis selects in each case a possibly 
different time windows At. Larger time windows At give rise to algorithms that have 
less processor communication for the same error tolerance. 

6.1. Randomized processor communication schedules. A generalization 
by Kurtz, |13j . of the Trotter Theorem suggests numerically consistent schemes in 
which evolutions are applied not in a deterministic, prescribed, order but as a ran- 
dom composition of individual propagators resulting in a random evolution. Given a 
pure jump process X{t), with stationary measure /x(d£), and given the infinitesimal 
generators £k we define a random evolution by 

T n {t)f = e To/nC ^e Tl/nC ^ . . . e r "<" t ' /nC «iv(„ t ) / , 

where N(t) is the number of jumps up to time t and are the sojourn (waiting) times 
at the visited states (£oj ■ ■ • )£iv(t))- The random Trotter product theorem yields the 
expectation semigroup 

lim T n (t)f = e tE f, a.s. (6.1) 

n— >oo 

with the generator C characterized explicitly 

£f = I Ctfn(d£). (6.2) 

While the random Trotter formula serves as a motivation for constructing schemes 
in which the evolution of the system, i.e., the process {(Tt}t>0i is approximated by a 
process obtained from a random composition of propagators e AtCfc , the error analysis 
in the spirit of Theorem 15.31 requires more careful inspection of the approximating 
process {7fc/i}fe =0 on the interval [0,T] with T = nAt. 

We present the construction in a simpler case of the independent identically dis- 
tributed random variables that index the individual generators £^. We analyze the 
randomized Lie scheme for the operator splitting given by £ = £±+£2- In the context 
of the parallel FS-KMC the random process X(t) can be interpreted as a stochastic 
PCS. In [T] we demonstrated that the sub-lattice parallelization algorithm for KMC, 
introduced in |23j , is a particular example of a fractional step algorithm with stochas- 
tic PCS. In [23] each sub-lattice is selected at random, independently and advanced 
by KMC over a fixed time window At = h, subsequently a new random selection is 
made and again the sub-lattice is advanced by h, etc. This algorithm is easily recast 
as a fractional step approximation, where we can show that £ = \ (£\ + £2) which 
is a time-rescaling of the original operator £. From the numerical analysis viewpoint, 
our re-intcrprctation of the algorithm in |23j as (|6.1[) allows us to provide a rigorous 
justification that it is a consistent estimator of the serial KMC algorithm. Next we 
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present the local error analysis of randomized PCS and in analogy to Lemma |4.2[ we 
estimate the mean (weak) local error of the approximating 7-process. 

Definition 6.1 (Random Lie splitting). Let Pi(t), i = 1,2, be two Markov 
semigroups with the infinitesimal generators Li and the transition probability kernels 
Pi(t;j,j'). Assume {£,i,£,2, ■ ■ ■} be a sequence of i.i.d. Bernoulli random variables 
with values £ £ {1,2}. We define the random evolution as the process {^khY^Q by 
setting for h > 0, k = 0, 1, 2, . . . , n, and ^2k, &k-i independent 0/ 70, 7/1, ... , 7(fc-i)/ l 

nftikh) I 7(k-i)fc] == (h)P S2k (fc)/(7(afc-i)fc) > ( 6 - 3 ) 

where the transition probability kernel is 

[Pb MPs, (h)f\ (V)=J2J2 PCi ^ 1> rtPb ft V, 7")/(7") • 
7' 7" 

For a given / £ Ct(S) we estimate the quantity E cr [/(cr fc / l )] and ^[fijkh)} where 
the expected values are computed on the corresponding probability spaces associated 
with each process and conditioned on the initial states o-q = a and 70 = 7 respectively. 
We denote the initial states by different letters in order to distinguish between these 
two different probability path measures, however, the initial state is assumed to be 
same for both {er t } t > and {"/kh.}k =0 - 

Theorem 6.2 (Local Error). Assume P(f fe = l) = P (£ fc = 2) = \, for the 
approximating process {7fch}5J =0 of DeHnition \6.i[ Then for any f G Cb(S) and given 
At = h > 0, the exact process {o~t}t>o with cfq = 70 = 7 corresponding to the generator 
i£ satisfies 

E~> [/( 7/i )] - E ff [f{a h )\ = E« [(P 6l (h)P i2 (ft)/( 7 ) - «( 7 , h))} 



4 £ " 



■E« 
2 



£| 1+ 4 2 +2£ 6 ^ 2 -i/: 2 



where 14(7, /i) = P{h)f{^) is the solution of the rescaled, by 1/2, equation $K 

d t u((,t) = ±£u((,t), u(C,0) = /(0- (6-4) 

Proof. We estimate the local truncation error following similar steps as in the 
deterministic case. From the definition of the 7-process we have 

K'[f( 7h )]=E^[P il (h)P i2 (h)m], 

and similarly, using the fact that the initial states are same, Co = 70 = 7, 

E°[f(a h )]=P(ti)f( 7 ) = u( 7 ,h). 

Hence we obtain a representation of the mean local error 

W[f{ lh )\ -Wiffa)] =E« [(^(^(/i)-^))/^ . (6.5) 

Now for given realizations of £1, £2 we have the expansion of P^ (h)P^ 2 (h) — P(h) as 
in the deterministic case, thus obtaining 

[P (l (h)P (2 (h) - P(h)]f = 

1 h 2 1 (6-6) 
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Note that i£ = \L\ + \C 2 is associated with the process {<r t }t>o- We have that the 
leading term of the local truncation error is E^fjC^ + — ^C] and thus this term 
vanishes whenever i£ = E^fjC^ + £f 2 ], which holds true when P(£k = 1) = P = 
2) = i.D 

Remark 6.1. For Bernoulli variables £j with probabilities p this means that 
the 7-process approximates a process with the generator pC\ + (1 — p)C2 instead of 
\L = \C\ + \Li. This indicates that the usual order of the Lie splitting is achieved by 
properly weighing the time steps, i.e., applying P\{h\) and P 2 (h 2 ) with different time 
steps h\ and h 2 respectively. This calculation also shows that if we want to obtain 
the generator C instead of \L in Lemma 16.21 then in order to evolve the process a 
by the time step h, each semigroup P^ , P^ 2 needs to be applied with the time step 
2h, giving rise to the approximating process jh- In this case we have the local error 
representation 

W[f{ lh )]-W[f{a h )} : = E« [{P 6l (2h)P i2 (2h)f(j) - u( 7 , h))} 

= yE« [ACl + 4Cl + 8£ Cl £ & - C 2 } /( 7 ) (6.7) 
+0(h 3 ) , 

where u(j, h) = P(h)f(j) is the solution of (|2.0I) . 

6.2. Comparison of deterministic and random schedules. The presented 
error analysis allows us to evaluate and compare deterministic (Lie and Strang) PCS 
introduced in [T], as well as randomized PCS such as the one in Lemma introduced 
earlier in [23]. We compare the deterministic and randomized PCS from the point 
of view of processor communication and error analysis by specifying the same error 
tolerance TOL for all PCS which, by means of our error analysis, selects in each case 
a possibly different time window At. Larger time windows give rise to algorithms 
that have less processor communication for the same error tolerance. We start with 
the Lie and Strang schemes. 

We fix the same error tolerance level TOL in the Lie and Strang global errors (|4.6p 
and (|4.8p respectively. We also fix the same time window T = n^At^ and T = nsAs 
where At^ and Ats are the respective time steps of the Lie and the Strang schemes 
that will ensure the same tolerance level TOL up to time T. Based on Theorems 14.31 
and 15.31 we have that the leading errors are governed by the commutators 

TOL~C Lic (T)Ai Lio , C Lic (T)= max || [C u C 2 ]u(t k ) IU , (6.8) 

k— 0,...,n 

and 

T0L~C Strang (T)A4 rang , (6.9) 
Cstrang(T)= max || f [&, [&, C 2 }} - 2[C 2 , [C 2 , &]]) u(t k ) , 

k— 0,...,n \ / 

where u = u(t) solves (|2.6p . Furthermore, due to (|5.2p and (|5.3p we have that 

/L d+1 \ L 2d+1 \ 
TOL-0 Ai Li0 , TOL~0( A4 rang . (6.10) 



In the case of the randomized PCS the same reasoning as in Theorem 14.31 allows us 
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to iterate the mean local error (|6.7|) to obtain 
TOL - Cr 

andom (T)Af Random , 

CRandom(T) = ^gax E £ [4C| + AC% + 8C (l C S2 - C 2 ] u(t k ) , (6.11) 

where u = u(t) solves (|2.6p . We now easily obtain that 

E s [4£^ + 4£| 2 + 8£ 6 £ 6 - C 2 ] u{t) = [AC\ + 4C 2 + C 2 ] u(t) . 

Thus, due to the rigorous remainder bounds in Section [5] on the solution of (|2.6[) such 
as Lemma [5751 we have that the term || [4£f + 4£| + £ 2 ] u ||oo is of order 0(1) in the 
system size N, and we have 

TOL ~ 0(l)At Random • (6.12) 

In order to achieve the same error tolerance TOL, (|6.10p and (|6.12[) imply the 
following relation between the respective time steps 

L d+1 

5t SSA < At Random — At Uc < At Lic - i d A^ trang < Aigtrang ■ (6.13) 

Here q is the diameter of each of the cells Ck in Figure I3TT1 and StssA = 0(1/N) is the 
stochastic time step (the waiting time) of the SSA algorithm [7], which is exponentially 
distributed according to ()2.2[) . 

The relation (|6.13p has several practical implications. 

(i) The selection of the time window At in each PCS is intrinsically goal-oriented in 
the sense that it depends directly on the macroscopic observable /(cr) through 
the commutator estimates of the solution to (|2.6[) . 

(ii) The random and deterministic PCS studied here are rigorously partially asyn- 
chronous as their respective time windows are much larger than the SSA time 
step StgsA for a given error tolerance. 

(iii) The Lie scheme (|3.5[) is expected to parallelize better than the randomized PCS 
in [55] when L d+1 <C q, since it allows a g-times larger time step At for the same 
accuracy. This outcome is also demonstrated in Figure [6~T1 

(iv) Finally, among the PCS we studied, the Strang PCS yields parallel schemes with 
the least processor communication, at least when L ~ 0(1), due to its higher 
order accuracy and the commutator estimate (|5.3|) . 

Example 6.1. We demonstrate this comparison in a computational example in 
which a jump process defined by Arrhcnius spin-flip dynamics on a one-dimensional 
lattice was simulated. The simulated system corresponds to the Ising model with 
nearest-neighbor interactions and spins taking values in {0, 1}. The rate of the process 
is give by 

c(x,a) = c d (l - a(x)) + c a a(x)e- fiu ^ , 

where U(x) = J(a(x — 1) + a(x + 1)) + h, and Cd, c a , /3, J, h are the parameters of 
the model. 

We verified the theoretical order of convergence by computing the error 

f \E[C(t)]-E[C(t)]\dt 
Jo 
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where C(t) and C(t) are the reference KMC and the FS-KMC solution, respectively, 
obtained by averaging the spatial mean coverage process C(t) = J2 x eA N a ti x ) °f 
the system over K independent realizations. For the reference solution, the classical 
stochastic simulation algorithm (SSA) was used. In order to eliminate the impact of 
the statistical averaging error K = 10 5 independent samples were used. The error 
bars are below resolution of the graph depicted in Figure 16.11 In Figure 16.11 the 
error behavior is compared for different values of the splitting time step h = At for 
the randomized PCS and the Lie splitting. The lattice size is N = 800 and the 
parameters of the system are /3 = 15, J = 0.37, h = 0.5 and c a = Cd = 1. For the 
fractional step algorithm four processors were used, thus the size of the sub-lattice is 
q = 100. The final time is chosen to be T = 4. 




Fig. 6.1. Convergence of the weak error for deterministic and randomized Lie splitting. 

Example 6.2. In this example we investigate the dependence of the weak error, 
as defined in the previous example, on the sub-lattice parameter q. The model we used 
to run the simulation is Ising model, as described in Example 16.11 The parameters 
for the model are (3 = 5, J = 1, h = 0.5, and c a = Cd = 1. The final time is 
chosen to be T = 5 and the dimension of the lattice N = 480. For the FS-KMC 
algorithm a constant, and rather large, time step parameter At = 5 was used. For 
the FS-KMC algorithm we used K = 10 4 samples to compute the mean value of the 
solution on the interval [0, T] and for the reference solution, which was obtained with 
the SSA algorithm, K = 10 5 samples were used. In Figure 15721 we can observe that the 
deterministic schedules of Lie and Strang give better results than those of the random 
PCS. Also the Strang scheme has lower error than the Lie scheme as expected from 
the theoretical analysis. Finally, the dependence of the error on | is also revealed, 
which in logarithmic scale is shown as a straight line. 

7. The infinite volume limit. In this paper we considered interacting particle 
systems defined on a d- dimensional lattice A^r, as the numerical analysis and simu- 
lations for the parallel fractional step Kinetic Monte Carlo are performed on a finite 
lattice of size TV. However, given the size of real molecular systems it is necessary that 
numerical estimates are independent of the system size as we showed in Section [5j 
Alternatively we can consider the case — >• oo, e.g., by setting up our analysis on the 
infinite lattice A = 7L d . We outline the latter approach here for completeness of our 
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Fig. 6.2. Dependence of the weak error on the sub-lattice size parameter q, see also (6.13V . 

analysis. We refer to [15] for a comprehensive study of interacting particle systems 
set on infinite lattices. 

First, we consider the configuration space S = £ A , where A = Z d and the space 
of bounded continuous functions C b (S). Then the generator (|2.7I) is defined on a 
suitable domain £>(£), 

C : D[L) C C b (S) ^ C b (S) . 

In this case, Theorem 14.31 is restated similarly to Theorem 3 in [10] , provided the 
solution u = u(t) of flU) satisfies u(t k ) e T>(jC^jC^ 2 ) for |m| < 3 and k = 0,...,n. 
As it was also pointed out in [ 1 Oj this is in principle an uncheckable hypothesis. 
However, this is not the case here: due to the results of Section [5] we have that if 
/ 6 C 2 (S) is a macroscopic observable, where 

m 

C m (S) :={f€C b (S)\ ^||/|| fc <oo},VmeN, 

k=l 

then u(t) S V^C™ 1 £™ 2 ) due to Theorem [O] and Remark [S~T1 Therefore, all estimates 
of Section [5] hold also true in the infinite lattice A, which is certainly not unexpected 
since all previous results in Ajv were independent of the system size N. 

8. Conclusions. In this paper, we derived numerical error estimates for the 
Fractional Step Kinetic Monte Carlo (FS-KMC) algorithms proposed in [1] for the 
parallel simulation of interacting particle systems on a lattice. These algorithms 
have the capacity to simulate a wide range of spatio-temporal scales of spatially 
distributed, non-equilibrium physiochemical processes with complex chemistry and 
transport micro-mechanisms, while they can be tailored to specific hierarchical paral- 
lel architectures such as clusters of Graphical Processing Units. A key aspect of our 
approach relies on emphasizing a goal- oriented error analysis for macroscopic observ- 
ables (e.g., density, energy, correlations, surface roughness), rather than focusing on 
strong topology estimates for individual trajectories or estimating probability distri- 
butions solving the Master Equation (Forward Kolmogorov Equation). Our analysis 
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also addresses earlier work on parallel KMC algorithms [231 [20] that fit into the FS- 
KMC framework. Furthermore, moving beyond the parallelization problems discussed 
here, it appears that these methodologies, introduced in Section [SJ can be generally 
useful in the development and study of numerical approximations of molecular and 
other extended systems. Our error analysis allows us to address systematically the 
processor communication of different parallelization strategics for KMC by comparing 
their (partial) asynchrony, which in turn is measured by their respective fractional 
step time-step for a prescribed error tolerance. 
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APPENDIX 

Appendix A. A general form of Gronwall's inequality. 

For the sake of completeness we prove a variant of Gronwall's lemma for a partic- 
ular case that appears in the proof of Proposition 15.101 We prove it in the presence 
of two equations, but the result can be easily generalized for a system of equations. 

Lemma A. 1 (Gronwall's inequality) . Let ■& and 93 satisfy the following inequalities 

cp(t)<cp(Q)+ f v(s)ds 
Jo 

«?(£)< #(0)+ / ti(s)ds+ I ip{s)ds 
Jo Jo 

then 

<p(t) < eV(0) (A.l) 
0(t) < e*0(O) + (e* + ie* - l)p(0) (A.2) 

Proof. The first estimate follows directly from Gronwall's inequality. By integrat- 
ing this inequality on [0, t] 

<p(s) < (e*-%(0), 
and by substituting this to the second inequality we obtain 

7?0) < 0(0) + (e* - 1)93(0) + / iJ(s)ds. 
If we multiply by e~* and integrate on [0, t] we have 



o 







e 



<d{r) dr<(l- e _t )0(O) + (e* + te l - 1)^(0) 



and after straightforward calculations 

&(t) < e*0(O) + (e* + te} - 1)93(0) . 

□ 

Remark A.l. Let $(t) = (<fi(t), <p n {t)) satisfying 

$(*) < $(0) + f A$(s)ds 
Jo 
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where A is a constant lower triangular matrix and the inequality has the meaning 
that it is true component-wise, then 

S(t) < fl(t)S(0) , 

where B is a lower triangular matrix with elements exponentially depending on t. 



